Multi-Label classification for Mining Big Data
نویسندگان
چکیده
In big data problems mining requires special handling of the problem under investigation to achieve accuracy and speed on the same time. In this research we investigate the multi-label classification problems for better accuracy in a timely fashion. Label dependencies are the biggest influencing factor on performance, directly and indirectly, and is a distinguishing factor for multi-label from multi-class problems. The key objective in multi-label learning is to exploit this dependency effectively. Most of the current research ignore the correlation between labels or develop complex algorithms that don’t scale efficiently with large datasets. Hence, the goal of our research is to propose a fundamental solution through which preliminary identification of dependencies and correlations between labels is explicit from large multi-label datasets. This is to be done before any classifiers are induced by using an association rule mining algorithm. Then the dependencies discovered in the previous step are used to divide the problem into subsets depending on the correlation between labels for parallel classification. The experimental results were evaluated using Accuracy, Hamming Loss, MicroF-Measure and Subset Accuracy on a variety of datasets. The proposed model exploits all correlations among labels in multi-label datasets easily, facilitating the process of multi-label classification, increasing accuracy and performance as time was decreased while achieving higher accuracy. Keywords-Multi-Label classification; data mining; big data analytics;
منابع مشابه
Exploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملA Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy
Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...
متن کاملA Knowledge Based Approach for Tackling Mislabeled Multi-class Big Social Data
The performance of classification models extremely relies on the quality of training data. However, label imperfection is an inherent fault of training data, which is impossible manually handled in big data environment. Various methods have been proposed to remove label noises in order to improve classification quality, with the side effect of cutting down data bulk. In this paper, we propose a...
متن کاملMulti-Objective Model for Fair Pricing of Electricity Using the Parameters from the Iran Electricity Market Big Data Analysis
Assessment of the electricity market shows that, electricity market data can be considered "big data". this data has been analyzed by both conventional and modern data mining methods. The predicted variables of supply and demand are considered to be the input of a defined multi-objective for predicting electricity price, which is the result of the defined model. This shows the advantage of appl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015